Harvard and MIT founded edX , a major online education provider in the United States. It offers university level courses in a variety of fields to a global student base, with some courses available for free. Over the course of this time with over 3000 courses and over 1.4 million certifications granted to students, Harvard and MIT have played an instrumental role in the development of a thriving market for college-level content. From the kaggle dataset provided to us for this Hackathon challenge we decided to infer a few details about this huge domain and come up with visualizations that will better showcase the content of the dataset. We also did work with quite a few new libraries like zoo which intends to perform calculations containing time series of numeric vectors, matrices and factors and Highcharter in R which is a very flexible and customizable charting library.
The visualizations presented in the below report consists of a blend of charts and plots while using various columns like gender, course subject, launch date of the course , audited courses from the dataset. The first section deals with the course subjects offered by Harvard and MIT and the count of %Audited according to institution and Course Subjects. It also includes the count of %Audited according to Institution and Course subjects. The second section deals with interpreting Course Subjects on various factors. We have used box plots, heat maps,and highcharts for showing this trend. The third section deals with the count of participants(Course content accessed), we have visualized the former according to course, year and students who accessed 50% course. Other than this, we have also visualized the density of gender ratio in a particular course.
The two highcharts here show the number of courses offered by MIT and Harvard respectively and under which category. We can see that MIT offers almost 50% of courses that fall just under Science, Technology, Engineering and Mathematics and minimal courses under Humanities,History,Design,Religion and Education. On the other hand offers, Harvard offers mostly Humanities,History,Design,Religion and Education course.
The first stacked bar plot shows the courses offered by MIT have higher courses approved for auditing.Similarly, in the next bar plot we can see the different course subjects and the percentage of each of these that are audited. Science, Technology, Engineering and Mathematics are the most audited courses. This depicts that Students are more inclined towards Science and Technology courses. These Institutions should encourage students to study other courses too.
We have used box plots to answer these questions. The first box plot shows that Computer science courses are the highest certified courses and Science , Technology , Engineering and Mathematics is one of the least. The next box plot answers the percentage of the courses posted on the forum that shows Humanities, History, Design,Religion and education are the highest among the rest. The next plot shows the total course hours of each of the course subjects, and we can observe here that Computer Science is a clear winner. And finally the last box plot summarizes the Median hours of Certification required for all the course subjects and while Science, Technology, Engineering and mathematics and Computer science are quite close in their results, however STEM courses require close to 50 hours more.
Through the heat map used to analyze the trend of each of the course subject we can understand the number of course hours offered by them through 2012 to 2016. We can observe that Subject courses have had a range of close to 200-400 course hours. Another couple of keen observations is that Humanities , History , Design , Religion and Education did not have any course hours released in the year of 2012 also in the same year Government, Health and Social Science had the highest number of course hours released leading up to > 800.
The Highchart shows that most people of the age 25 median age prefer STEM courses. Similarly, people from the age 27 - 28 prefer Computer science. While Government, Health and Social Science is highly accessed by 30 median age and Humanities, History, Design Religion and Education is popular among 31.
Here to answer this business question we have used an alluvial chart to show the trend of the course subject accessed across the years. We can see that there is a good blend in the courses accessed from both MIT and Harvard. But we can clearly see that courses from Computer Science and Technology,Engineering and Mathematics are mostly preferred from MIT whereas Government, Health and Social Science and History,Design,Religion and Education are mostly preferred from Harvard.
The scatter plot here shows the relationship between the two variables namely Participants and Audited courses. Through the results here we can observe a moderately strong positive relationship between the two.
We have used density plots to analyze the answer to this question. Here we can see that almost more than 80% of the males prefer Computer Science. For the female population, density graph depicts that approximately 20% of them prefer Computer science. This shows that Female population should be encouraged to study Computer Science.
During this Hackathon we as a team worked on the dataset:Online Courses by MIT and Harvard from Kaggle. The aim was to create clear and concise interpretations from the data to better understand it and all its attributes along with improving our skills in data visualization in R. We have used several libraries in the course of this assignment. We used ggalluvial which uses geom alluvial to create alluvial charts, Highcharts in R to create interactive and dynamic charts, Lubridate in R to make it easy to work with dates.
Some of the key findings from the dataset that we came across is that MIT offers almost 50% of courses that fall just under Science, Technology, Engineering and Mathematics and minimal courses under Humanities, History, Design, Religion and Education. Whereas Harvard on the under hand offers more that 50% of it courses under this. Through the alluvial chart we see that the courses from Computer Science and Technology, Engineering and Mathematics are mostly preferred from MIT whereas Government, Health and Social Science and History, Design, Religion and Education are mostly preferred from Harvard. We also have a heatmap which is another way to visualize hierarchical clustering and, in our case, produces a few useful insights on the course hours of all the course subjects across 2012 to 2016. In conclusion this assignment has better equipped us to utilize the right kind of charts for the right set of data that is being compared or correlated. It has also familiarized us with the vast range of color palettes that can be used in R to beautify our plots/graphs.
From the visualizations we did, it can be interpreted that :
Female population should be encouraged to study Science and Technology courses.
Harvard should focus on developing Science and Technology courses as students are choosing MIT over Harvard. And, MIT should focus Government, History and Design courses.
There are many students who access course content but do not complete it. Institutions should focus on students getting the course done